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Summary. In multiple testing, a variety of control metrics have been introduced such as the 
Family-Wise Error Rate (FWER), the False Discovery Rate (FDR), the False Exceedence Rate 
(FER), etc. We found a way to embed these metrics into a continuous family of control met- 
rics, all of which can be attained by applying a simple and general family of multiple testing 
procedures. The new general error rate (GER) limits the number of false positives relative to 
an arbitrary non-decreasing function of the number of rejections. An example is V/R p , the 
number of false rejections divided by the number of rejections to a power < /? < 1. We 
investigated both the control of quantiles and of expectations and provide the corresponding 
multiple testing procedures. In the above example, the expectation of the criterion thus leads to 
a family of multiple testing procedures that bridges the gap between the FWER and the FDR. 

Keywords: Multiple comparisons, Family-Wise Error Rate, False Discovery Rate, ordered p- 
values. 

1. Introduction 

The problem of multiple testing is a key technology in a variety of modern applications 
such as genomics or neuroimaging. If m hypotheses are tested and if each hypothesis is 
tested separately at significance level a (without considering multiplicity), the probability 
of observing at least one false significant result (the FWER), even if there is no real effect 
and if the tests are independent, is 

FWER = P(at least one significant result) = I - (1 - o) m . (I) 

For example, if m = 100, and a = 0.05, this probability is 0.994. The expected number of 
false positives (the Per Family Error Rate, PFER) is am — 5. Both quantities increase with 
m and are out of control. Although the evidence that a correction for multiplicity should be 
mandatory, a large number of claims are published without a proper control. See Bcnjamini 
(2010). A general outcome of multiple comparisons is summarized in Table 1. The idea 
that the control of false positives V should be considered in conjunction with the number of 
rejections R has been widely accepted by users after the introduction of the False Discovery 
Proportion (FDP) ~. Nevertheless, in situations where false discoveries have expensive 
consequences, the FWER remains a viable criterion. We define our general error rate as 
jpjT) where s(R) is a scaling function which typically grows more slowly than R itself. This 
simple device covers and generalizes almost all the existent error rates. In section 4, we 
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Table 1 . General outcome of a multiple comparisons situation. V denotes the number 



of false positives and R the total number of rejections. 
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propose procedures that control the general error rate under different assumptions. The 
main results are the control of a tail probability by an adaptive step-down (SD) procedure 
under the Simes (1986) inequality, the control of a tail probability under any assumption, 
and the control of the expectation under independence using a step-up (SU) procedure. 

2. Historical background and motivation 

Traditional multiple comparisons procedures (MCPs) are designed to control the FWER 
= P (V > 0), where control at level a means that FWER < a. This is achieved by the 
Bonfcrroni (1936) procedure which performs each of the to tests at level a/m or equivalently, 
by rejecting the hypothesis Hi (i — 1, ...,m) if its corresponding p-value is less than a/m. 
The Bonferroni procedure is the simplest and the strongest procedure in terms of control of 
V. It even controls the PFER at level a which is stricter than the FWER. However, when 
m grows, the power of the Bonferroni procedure at any fixed alternative tends to 0. 

Many other MCPs that control the FWER have been proposed, although they typically 
give only a slight improvement over the Bonferroni procedure. They all compare the ordered 
p- values to thresholds which depend on the global control level a and the rank of the p- value. 
A SU procedure compares the ordered p-values with the critical thresholds beginning with 
the larger and thus the less significant p-value and starts to reject hypothesis once a p-value 
is less than its corresponding threshold. After this first crossing, it rejects all hypotheses 
with smaller p-values. A SD procedure begins the comparison starting with the smallest 
or the most significant p-value. If it is greater than its corresponding threshold then, no 
hypothesis is rejected. Otherwise, reject hypotheses as long as their p-values are less than 
their corresponding thresholds and stop rejecting once a p-value exceeds its corresponding 
threshold. Examples of step-wise procedures include Holm (1979), Simes (1986), Hochbcrg 
(1988). Note that a SU procedure has a power that equals or exceed the power of a SD 
procedure that uses the same critical thresholds. See Horn and Dunnett (2004). 

Safeguards against false positives is not the unique purpose of testing. Detecting real 
effects is also of great importance. Benjamini and Hochberg (1995) introduced the FDR as 
an alternative to the FWER with the aim of increasing power. The FDR is defined to be the 
expected value of the FDP. We have FDP = V/R if R > and otherwise FDP = 0, while 
the FDR is defined as FDR = E(FDP) = E | R > 0] P(R > 0). Since FDR < P[V > 0], 
the FDR is less stringent than the FWER, which should lead to higher power. The FDR 
has the same behavior as the FWER when all hypotheses are true which implies V = R. 
Thus, the FWER is weakly controlled. Using an FDR controlling procedure, the number 
of false positives increases with the number of rejection R. Despite this drawback, the 
FDR has been widely adopted in many fields of application and it is fair to say that the 
paper of Benjamini and Hochberg (1995) had a huge impact on the practice of statistics. It 
has been cited more than lO'OOO times up to now. Alternatives to the FDR are available. 
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Table 2. Summary of some existing error metrics. 





Tail probability 


Expectation 


Not dependent on R 


FWER = P(V > 0) < a 
k-FWER= P(V >k)<a 


PFER = E(V) < a 
PCER = E(V/m) < a 


Dependent on R 


FER = P(V/R > 7) < a 


FDR = E{V/R) < 7 

k-FDR = E((V ~ k)+/R) < 7 

pFDR = E(V/R\R > 0) < 7 



Vector (1982), for example, considered the k-FWER — P (V > k), which tolerates more 
false positives and thus increases the power. This seems appropriate when the number of 
hypotheses m is large. Hommcl and Hoffmann (1988) and Lehmann and Romano (2005) 
derived a single step and a step-down procedures to control the k-FWER. The single step 
procedure is identical to the Bonfcrroni procedure except that the p- values are compared 
to ka/m instead of a/m. This procedure is evidently more powerful than the Bonferroni 
procedure. However, the weak control of the FWER at level a is no longer guaranteed. In 
fact, the expected number of false positives under the complete null hypothesis (m = m) 
is ka. Lehmann and Romano (2005) derived step-wise procedures to control another al- 
ternative error metric, the FER, which is defined by P(FDP > 7) with 7 6 (0, 1). Many 
other concepts of false positives error rates have been proposed in the literature. All these 
concepts have a certain control of false positives situated in between two extremes, the Per 
Comparison Error Rate (PCER) and the PFER control. Dudoit and van der Laan (2008) 
and Bcnjamini (2010) are good sources for additional information. 

The false positives metrics can be grouped by two important criteria. First, one can dis- 
tinguish between metrics that control the probability of exceeding a constant such as the 
FWER, the k-FWER or the FER and metrics that control the expected number of a cer- 
tain quantity such as the PFER, the FDR, the k-FDR or the positive FDR (pFDR) (Storey 
(2002)). Second, one distinguishes between metrics that do not consider the number of 
rejections such as the FWER, the k-FWER or the PFER and metrics that tolerate more 
false positives as more hypotheses are rejected, such as the FDR, the pFDR or the FER. 
Table 2 summarizes this information. 

The error rates that control the proportion of false positives are especially appealing for 
large scale testing problems, compared to error rates that do not consider the number of 
rejections, as they remain stable when the number of tests m increases. See Dudoit and 
van der Laan (2008). 

The range of metrics is confusing, especially for non experts. In addition, most of the 
procedures need additional assumptions in order to become attainable. If one or more 
of theses assumptions are not satisfied, a control failure results. According to Bcnjamini 
(2010), none of the metrics is superior in all aspects. In fact users may wish to have several 
distinct controls achieved by one procedure. For example, one might be willing to derive a 
powerful procedure that weakly controls the FWER. FDR control procedures seems ideal, 
but at the risk of large values of V when R becomes large. To avoid this, one might want to 
add strong control of the k-FWER as the single step of Hommel and Hoffmann (1988) does. 
The general error rate offers such compromises. If we choose Sk{R) — R for R < k and 
Sk(R) = k for R > k, the general error rate demands for control of V/R for up to a certain 
number of rejections, but then switches to a fixed control of V/k thereafter (cut-off FDR). 
For k = 1, this results in control of the FWER, while for k = 00 we obtain the FDR. This 
is another example of a family of scaling functions that bridges the gap between these two 
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extremes. We can easily show that for independent or positively dependent test statistics 
(Benjamini and Yekutieli (2001)), the following procedure controls the FDR to be less than 
7 and controls the PFER to be less than which implies the control of the k-FWER at 
level 7 since P (V > k) < E (^) by Markov's inequality. 

Procedure 2.1. We test m hypotheses. Let < p^ 2 ) < ■■■ < p< m ) be the ordered 
p-values, and denote by the null hypothesis that corresponds to pu\ . Let l be the largest 

i that satisfies p^) < ^ as well as p^ < that is, p^ < Sfc ^ 7 ; then reject all with 
i = 1,2,...,?. 

Theorem 4.6 gives a general result of this type. The use of such a procedure would be 
appropriate, particularly in fields where the number of hypotheses tested m is very large 
(of the order of 10 5 or even 10 6 ). Examples are fMRI as well as some genomic studies. This 
procedure has the control of the FDR when the number of rejections R is small, but limits 
the expected number of false positives as R grows. 

For illustration, consider a case, where m = 10 4 independent tests arc performed whose 
distribution under the null is A/"(0, 1). Among the m tests, mi = 5x 10 2 correspond to false 
hypotheses, in which case the distribution of the test statistics is Af(A, 1), with A = 2. By 
setting 7 = 0.05 and kj = 1, the power of the FDR and the cut-off FDR are 0.0505 and 
0.0413, and the expected number of false positives E(V) are 1.359 and 0.881. When the 
effect A grows to 4, the respective powers are 0.879 and 0.611 and the expected numbers 
of false positives are 21.791 and 0.991. Clearly, the FDR achieves its "superior" power by 
also rejecting a high number of true hypotheses. In many situations, having around 20 false 
positives is completely out of the question. This example shows that the cut-off FDR gives 
an added protection often without a high cost in term of power. Note that the theoretic 
value of the power of the Bonferroni procedure in this situation is 0.00783 and 0.3383 for 
A = 2 and A = 4 respectively. 

3. The general error rate 

The FDP takes into account the number of rejections when controlling false positives. This 
concept has a Bayesian background in the sense that R contains information about mi, 
which should be exploited. Our general error rate provides a generalization of the FDP. 

Definition 3.1. The scaled or general False Discovery Proportion sFDP with parame- 
ter I > 1 and non- decreasing scaling function s : {1, m} — > (0, oo) is defined as 



Based on this quantity we define two types of general error rates using two different stochas- 
tic functions. The first uses the tail probability of exceeding a constant, which we denote as 
scaled or general Tail Probability (sTP). The second uses the expected value of the sFDP, 
which we denote as scaled or general expected value (sEV) . Remotely related concepts were 
introduced in van der Laan et al. (2004) and described in Dudoit and van der Laan (2008). 
These authors consider transformations wich involves both V and R, while we concentrate 
on the denominator. The reason for doing so can be gleaned from 




V V R 



(2) 



s(R) R ' s(R) 
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which shows that control of the sFDPi (sFDP from now on) is equivalent to control of the 
FDP times a positive multiplier that depends on R and that could be greater or less than 
1 depending on the level of conservativeness that the researcher desires. In addition, when 
the scaling function s is a constant, the sFDP depends only on V. This means that in this 
case, the relation between the number of false positives V and the number of rejections R 
is suppressed. 



3. 1. The scaled Tail Probability (sTP) error rate 

Many error rates introduced in the literature are tail probabilities of exceeding a certain 
threshold such as the FWER, k-FWER and the FER. We define the sTP 7 as the probability 
that the sFDP exceeds a non-negative constant 7 > 0, 

sTP 7 = P [sFDP > 7] . 

sTP 7 with s(R) = R and 7 <= (0, 1) is identical to the FER, while sTP 7 with s (R) ■ 7 = 
(k — 1) is identical to the fc-FWER. In addition, sTP 7 with s (R) > and 7 = becomes 
P(V > 0), the FWER. 

The control of the sTP 7 implies the control of quantiles of the sFDP, because sTP 7 < a 
implies that the 1 — a quantile of sFDP is smaller than 7. In particular, when a = 0.5 
sTP 7 < a is equivalent to median (sFDP) < 7. 



3.2. The scaled Expected Value (sEV) error rate 

To comprehend some of the false positives error rates which are defined as the expected 
number of a certain random quantity, such as the PFER and the FDR, we define the sEV 

by 

sEV = E [sFDP,] . 

When 1 = 1, sEV with s (R) — R is identical to the FDR of Benjamini and Hochberg 
(1995), while for I = k > 1, and s (R) = R, it is identical to the fc-FDR of Sarkar (2007). 
In addition, sEV with s (R) = 1 and I = 1 becomes E(V), which is the PFER. 

One can also generalize some other concepts in the same way. For example, we define 
the positive sEV by 

V 



E 



s(R) 



R>0 



Note that for both sTP and sEV, when the scaling function is a constant, we find the 
error rates with names containing the word "family". Furthermore, if m = mo, 



sEV = £ 



R 

7{Rj 



R>0 



P[R >0} = E 



R 

7{Rj 



R>0 



x FWER. 



(3) 



This shows that if s(R) < R for any R in 1, m, the control of the sEV at level 7 implies 
the weak control of the FWER at level 7. 



6 D. E. Meskaldji, J.-Ph. Thiran and S. Morgenthaler 



4. Control procedures 

In this section we present generalizations of some existing procedures to control the sTP 
and the sEV. The procedures proposed are not limited to particular choices but for any 
choice of the scaling function. We show that some of the existing procedures can be simply 
modified to give a more general control. 



4. 1 . Procedures that control the sTP 7 

4.1.1. The generalized Lehmann and Romano procedure 

Lehmann and Romano (2005) proposed a SD procedure to control the FER. Here, we give 
a simply modified version of their procedure to control the sTP. 

Procedure 4.1. Let p^ < P(2) < • • < P( m ) be the ordered p-values of to tests, and 
denote by the null hypothesis that corresponds to p^ . Set 



m ' 
(L7«(»)J+1) 
m+lfs(i)\+l- 



if % < [ 1S {i)\ + 1; 
-a, if % > [js(i)\ + 1. 



(4) 



V P(i) > a i> then reject no hypothesis; otherwise, reject all hypothesis H(i)> where 



is the largest index satisfying 



P(i) < Oil, -,P(z) < «(»). 



(5) 



Note that if s (i) = i and < 7 < 1, this SD procedure is equal to the procedure 
proposed by Lehmann and Romano (2005) (LR05 from now on) for controlling the FER. 
Furthermore, if s(i) is a constant and js(i) = k — 1, we find the critical values of the SD 
procedure of Lehmann and Romano (2005) to control the k-FWER and of course, if 7 = 0, 
we find the Holm procedure that controls the FWER. 

We already know by Lehmann and Romano (2005) that for the case where js(i) is 
constant, the procedure defined above controls the sTP 7 at level a, under any dependency 
assumption of the p-values. The following theorem states the control in the case where 
-fs(i) is not a constant. 

Theorem 4.2. Denote by qn\ < •■• < qi mo ) the ordered p-values corresponding to the 
m true null hypotheses. Set M — min{L7s(m)J + l,m }. 
(i) For the step-down procedure with on defined in procedure 4-1, we have 



P [sFDP > 7] < P 



M 

u 

fc=L 7 s(i)J+i 



1(k) 



< 



ka 
to 



(6) 



(ii) Therefore, if the joint distribution of the p-values corresponding to the null hypotheses 
satisfies the Simes inequality, that is 



P 



9(1) 



< 



to 



2a 1 f m a 

^ (2) <-ju...uU roo) < — 



< a, 



(7) 



then P [sFDP > 7] < a. 
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The Simes inequality holds for many joint distributions of positively dependent variables. 
Sarkar (1998) showed that the Simes inequality holds for any multivariate positive distribu- 
tions of order 2 (MTP 2 ). Obviously, the condition (6) is less strict than the Simes inequality 
condition. For the particular case where |_7 S (*)J + 1 = c (a constant), the right side of (6) 
holds for any dependency distribution of the p- values. To show this, note that 



M 



u 

fc=L 7;s (i)j+i 



< 



ka 
rriQ 



P 



,k—c 



Ui ka 



m 



P 



9(c) 



< 



< a. 



(8) 



In the general case, that is, when |_7 S (*)J + 1 is not a constant, the following lemma 
stated by Lehmann and Romano (2005), can be used to give a sharp upper bound for the 
right side of equation (6). 

Lemma 4.3. (Lemma 3.1 in Lehmann and Romano (2005)) Let p\, ...,p n be n p-values 
that satisfy P {pi < u} < u for all i = 1, n and for any u e (0, 1). Let < P\ < (3 2 < 
■ ■■ < Ph 5= 1 f or some 1 < h < n. Then 



<n^2(0i-Pi-i)-. 



i=l 



The previous lemma leads to the the following result. 

Theorem 4.4. // the critical values on are replaced by 



a, = 



(L7*(l)J + U7s(m)J) 



(9) 



with Ci t h — J2i=i \> then P{sFDP > 7} < a for any dependency of the p-values corre- 
sponding to the true null hypotheses. 

Proof. By replacing in lemma 4.3, h and n by M and too respectively and by setting 
(3i = for i = 1, |_7s(l)J and Pi = for i = [js(l)\ + 1, M, we obtain 



P [sFDP > 7] < P 



< 



to 



= a 



M 

u 

_J=L7s(l)J+l 
M 

E 

J=L7s(l)J+l 
M 

S 7 

4 7S (i)j+i 



P(i) 



< 



to 



a \ 1 
mo J i 



(10) 



(11) 



(12) 



It suffices then to replace a by 



to have P [sFDP > 7] bounded by a. 



C(LT«(l)J+l,LT«(m)J) 

The constant C(L7«(i)J+i,L7s(m)J) is usually greater than 1, which means that the control 
under any assumption is more strict than under Simes inequality. This constant may be 
less than one in some particular cases depending on the value of L7 S (1)J + 1 but this could 
happen only when 7s (1) is greater than 1 which is less frequent. In addition, if the lower 
index is 1, the constant C is greater than 1. Depending on the scaling function and the 
value of [7s (m) J, the constant C could be greater or smaller than the one proposed in 
Lehmann and Romano (2005). 
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4.2. Procedures that control the sEV 

4.2.1. The generalized Benjamini and Hochberg procedure 

The Benjamini and Hochberg (1995) procedure can be simply modified to obtain a procedure 
that controls the sEV. 

Procedure 4.5. Let pn) < pfo\ < ••• < P(m) be the ordered p-values of m tests, and 
denote the null hypothesis that corresponds to p^ . Let i be the largest i that satisfies 

P(i) < ^T7/ th en reject all Hu\ with i = 1,2, 

Note that if s (i) = i, the procedure is the same as the procedure proposed by Benjamini 
and Hochberg (1995) (BH95 from now on) to control the FDR. If s (i) = 1, the procedure 
becomes the Bonferroni procedure. Furthermore, if s(i) = k, we find the single step pro- 
cedure proposed by Hommel and Hoffmann (1988) to control the PFER. In the two later 
cases, the procedure is a single step procedure and there is no need to order the p-values. 

Theorem 4.6. For independent test statistics, the procedure defined above strongly con- 
trols the sEV at level 2^7. 

Proof. The proof of this theorem is a straightforward consequence of the following 
lemma. 

Lemma 4.7. (Generalization of the main lemma in Benjamini and Hochberg (1995)) 
For any < too < to independent p-values corresponding to the true null hypotheses, and 
for any values that the mi = m — m p-values corresponding to the false null hypotheses 
can take, the procedure 4-5 satisfies 



In Benjamini and Yekutieli (2001), a more general and simpler proof of the FDR control 
is provided. This proof can be generalized to prove the control of the sEV under positive 
dependency. 

5. Simulations 

When comparing multiple comparison procedures, one must set a common measure of per- 
formance and a common measure for the safeguards against false positives. We performed 
a limited simulation study in order to show the interest in using scaling functions that lie 
in between the FWER and the FDR. For this purpose, we use the average power as our 
measure of performance and the expected number of false positives as a measure of the 
safeguard against false rejections. We restricted our investigation to the case where the 
null distribution of the test statistic is a standard normal and under the alternative the 
distribution is a shifted normal with mean A > and variance 1. The number of tests to is 
either 1,000 or 10,000 and the number of alternative hypotheses (or false null hypotheses) 
is mi = 7tto. The control levels are a = 0.5 and 7 = 0.05. For the sTP tests including the 
FER, the parameter a = 0.5 means that the median of the sFDP is controlled to be less 
than 7. 

The two scaling functions we consider are members of the families mentioned before, 
that is, 



£(sFDP|p mo+ i,...,p m ) = — 7. 
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Fig. 1 . Average power and expected number of false rejections for m = 1, 000 and A = 2. Bonferroni 
(horizontal line), LR05 or BH95 (continuous line), sTP or sEV with s\ (point dashed) and with s 2 
(dashed). The control levels are 7 = 0.05 and a = 0.5. 



and 

The constants of the scaling functions were chosen as k = I/7 for m = 1,000 and 
k = 2.5/7 f° r 171 = 10,000. Note that as long as E(V) < kj, this scaling function gives 
the same control as the FDR, but will not allow E(V) to grow beyond kj. For the second 
function we chose j3 — 0.8. 

The general error rate procedures were compared to some existing standards, Bonferroni, 
BH95 and LR05. 

Figures 1 through 4 show the simulated average power and expected false positives E(V). 
The power of the Bonferroni procedure with a = 0.05 can be computed analytically and is 
equal to 

P0WBo„f =l-$($- 1 (l-— )-A) (13) 

m 

The construction of the figures is always the same. There are four panels, each plotting 
a performance or safety measure as a function of 7T, the fraction of false null hypotheses. 
The top row of the panels contains the scaled tail probability procedures and uses the LR05 
procedure as a standard. These are step-down tests. The bottom row shows the scaled 
expected value procedures and uses the BH95 for comparison. These are step-up tests. The 
left-hand column shows the average power (the performance indicator) and the right-hand 
column depicts E(V) (the safety indicator). 

As expected, the scaled tests are in between the extremes defined by Bonferroni and 
BH95 or by LR05 since the scaling functions are chosen to be in between the horizontal line 
that corresponds to the Bonferroni procedure and the line with slope 7/m that represents 
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Fig. 2. Average power and expected number of false rejections for m = 1, 000 and A = 4. Bonferroni 
(horizontal line), LR05 or BH95 (continuous line), sTP or sEV with sj (point dashed) and with s 2 
(dashed). The control levels are 7 = 0.05 and a = 0.5. 





Fig. 3. Average power and expected number of false rejections for m=1 0,000 and A = 2. Bonferroni 
(horizontal line), LR05 or BH95 (continuous line), sTP or sEV with si (point dashed) and with s 2 
(dashed). The control levels are 7 = 0.05 and a = 0.5. 
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Fig. 4. Average power and expected number of false rejections for m = 10, 000 and A = 4. Bonfer- 
roni (horizontal line), LR05 or BH95 (continuous line), sTP or sEV with si (point dashed) and with s 2 
(dashed). The control levels are 7 = 0.05 and a = 0.5. 

BH95 or LR05. The performance of the scaled tests are close to the BH95 or the LR05 when 
the effect to be detected is clear-cut (A = 4). However, they are much safer to use than 
BH95 and LR05, in particular, using the cut-off function which affords a direct choice of a 
threshold that the number of false positives must not exceed. In the two cases, m = 1, 000 
or 10,000, where the effect to be detected is small (A = 2), the performance of the scaled 
tests is weaker in particular when compared to BH95 and LR05, but these procedure pay a 
heavy price in terms of false rejections as tt grows. Finally, these simulations demonstrates 
the importance of including the new general error rates which bridges the gap between the 
PFER and the FWER, and the FDR and the FER, especially when the number of tests 
performed is large in which case, the number of false positives generated by the FDR and 
FER control procedures becomes intolerable. 

6. Conclusion 

We introduced a new quantity, the scaled false discovery proportion sFDP, and we defined 
two metrics of false positives, the sTP and the sEV, that bridge the gap between the FWER 
and the FER and between the PFER and the FDR. For particular choices of the scaling 
function, the two metrics generalize the existent error rates. The new metrics offer to the 
user a large range of control by varying the scaling function depending on the choice of the 
level of conservativeness. We proposed as well some procedures to control either the sTP or 
the sEV under different assumptions. Other existent procedures can be generalized in the 
same way as presented in this paper. Two families of scaling functions were proposed to 
the user. We provided simulations that contrast the proposed procedures to standard ones 
in terms of power and control of false positives. The simulations showed the importance 
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of including the new general error rates especially, when the number of tests is very large, 
which is often encountered in modern applications. 



7. Appendix 



Proof. Theorem (4.2) The proof is based on the method of Lehmann and Romano 
(2005) The event [sFDP > 7] occurs only if for at least one random index i, the quantity 
sFDP exceeds 7. Among these indexes, denote the smallest one by j. Then P [sFDP > 7] < 
P [such j exists]. The range of the possible values of 7s (j) (0 < 7 < 1, 1 < j < m) is divided 
into L7*(1)J < 7*0') < L7*(1)J + 1. L7*(1)J + 1 < 7*0") < L7*(1)J + 2,..., lis(m)\ < 
I s (j) < ll s ( m )\ + !• 

Because of the definition of j, we must have p^ < ctj, is true and |_7 S 0')J + 1 < m o- 
Therefore, 

P [sFDP > 7] < P [{ L 7 «(1)J < is (j) < L7*(1)J + 1} U { L7*(1)J + 1 < 7* (j) < [js(l)\ + 2} 
U... U {M - 1 < 7s (j) < M}} , with M = min {js(m), m }. 

Let k — 1 < 7s (j) < k for k in {L7 S (1)J +1,...,M}. Then = < ctj because 
< 7 and > 7. This implies that is the fcth rejected true hypothesis, and 
k < j < m — (mo — k) which implies that mo < m + k — j. Therefore, if k — 1 < 7s (j) < k, 
the event {sFDP > 7} at step j implies that < So, 



M 

P[sFDP> 7 ] < P 

fc=L 7 s(i)J+i 

M 

< E p 

fe=L 7 s(i)J+i 

M 

u 

fe=L 7S (i)j+i 



ko. , . . 

q<k) < — ,k- 1 < 7s {]) < k 



< P 



M 

u 

fe=L 7 s(i)J+i 
ka 



q(k) 



< 



ka 
m 



, k — 1 < 7s (j) < k 



?(fc) 



< 



m 



Part (ii) follows trivially. 



Proof. Lemma (4.7) Our approach is based on the method of Benjamini and Hochberg 
(1995). The proof of this claim is by induction on m. Note that when m = 0, sFDP is 
identically 0. In this case, the claim is true for any value of m. So, we treat the case mo > 1. 
The case m = 1. Two cases. 

1. If R = then sFDP = 0. 

2. If R = 1 then V = 1. This leads to 



sFDP 



1/s (1) with probability s (1) • 7 
with probability 1 — s (1) • 7 



It follows that, 



sEV = E (sFDP) = 1/s (1) x s (1) • 7 + < -7 = —7. 

1 m 

The case m > 1. Suppose that the claim is true for any m' < m. We have to show that 
the claim holds for m + 1. 
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Denote by q(i), ■■■,Q(m ) the p-values that correspond to the true hypotheses and without 
loss of generality, denote by n, r mi (mi — 1, m + 1 — m ) the ordered p-values that 
correspond to the false hypotheses. Define jo by 

s(m +j) 

jo = max — — —7. 

\<]<m 1 (m+1) 

Here, jo is well defined because s is a non decreasing function. We set p' — s ^"° 4 f 1 3 °^ 7- 
9( TOo ) is either > p' or < p'. Then, 



E (sFDP|P mo+1 , P m ) = I* E (sFDP|P mo+1 = r 1; P m = r m , q [mo) = p) f, 

Jo 

+ / E(sFT>P\P mo+1 =r 1 ,...,P m =r m ,q {mo) =p)f q 



(p)dp 



<</, o) (P)dp 

= I + 11 

with/ ?(mo) (p)=m p( m o- 1 ). 

In the first integral p < p' , that is, m + jo hypotheses, including the m true hypotheses 
are rejected. Thus, sFDP = -4 = , m ° . , . 
The first integral becomes 

m 1 ,xTO 



s (m + jo) 
By the definition of jo, we deduce that 



(pT 



m fs(m +j) \ m " mo . ,, mo -i 
"7 = 7 (P ) 



s(m +jo)V / (m + 1) 

Now, for the second part. When both true and false hypotheses are considered together by 
their ordered p-values, the hypothesis Hi can be rejected only if there exists k, i < k < 
m + j — 1, such that p^) < ^rfi7' or equivalently 

P(fc) < s (k) too + j - 1 
p ~ m + j - 1 p ■ (m + 1) 

When conditioning on q( mo ) = P: each random variable 9i/p, for i = 1,2,..., too — 1, has a 
uniform U (0, 1) distribution. On the other hand, rj/p for i = 1, .., j are random variables 
situated between and 1 (not necessarily of uniform distribution) . Using the last inequality, 
to test Too + j — 1 hypotheses is equivalent to using the control procedure, with the constant 
7' = 7- Applying the induction hypotheses, we have 

WmolD D s to -1 TOo+j-1 to - 1 

E (sFDP P mo+ i = n, P m = r m , g (TOo) = p) < — —7 = — —7. 

y ' mo + j - 1 p • (to + 1) p-(m+l) 

(14) 
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The bound in inequality (14) depends on p, but not on the segment pj < p < Pj+\ for which 
it was evaluated, so 
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